English-Chinese Transliteration Word Pair Extraction from Parallel Corpora

نویسندگان

  • Chengguo Jin
  • Seung-Hoon Na
  • Dong-Il Kim
  • Jong-Hyeok Lee
چکیده

Bilingual dictionary construction is a time-consuming job; therefore many studies have recently focused on automatically constructing bilingual dictionaries from bilingual texts. In this paper, we propose two novel approaches called dynamic window and tokenizer based on statistical machine transliteration model to efficiently extract English-Chinese transliteration pairs from parallel corpora. Our approaches achieve a significant improvement over previous methods without using resources such as a Chinese segmentor, pronunciation dictionaries; hence these approaches can be readily applied to other language pairs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Extraction of Translational Japanese-KATAKANA and English Word Pairs

The method to automatically extract translational Japanese-KATAKANA and English word pairs from bilingual corpora is proposed. The method applies all the existing transliteration rules to each mora unit in a KATAKANA word, and extract English word which matched or partially-matched to one of these transliteration candidates as translation. For instance, if there is a word ‘グラフ’ (graph) in Japan...

متن کامل

An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment

We propose a language-independent method for the automatic extraction of transliteration pairs from parallel corpora. In contrast to previous work, our method uses no form of supervision, and does not require linguistically informed preprocessing. We conduct experiments on data sets from the NEWS 2010 shared task on transliteration mining and achieve an F-measure of up to 92%, outperforming mos...

متن کامل

Bilingual Dictionary Construction with Transliteration Filtering

In this paper we present a bilingual transliteration lexicon of 170K Japanese-English technical terms in the scientific domain. Translation pairs are extracted by filtering a large list of transliteration candidates generated automatically from a phrase table trained on parallel corpora. Filtering uses a novel transliteration similarity measure based on a discriminative phrase-based machine tra...

متن کامل

Acquisition of English-Chinese Transliterated Word Pairs from Parallel-Aligned Texts using a Statistical Machine Transliteration Model

This paper presents a framework for extracting English and Chinese transliterated word pairs from parallel texts. The approach is based on the statistical machine transliteration model to exploit the phonetic similarities between English words and corresponding Chinese transliterations. For a given proper noun in English, the proposed method extracts the corresponding transliterated word from t...

متن کامل

Collocation Extraction Using Monolingual Word Alignment Method

Statistical bilingual word alignment has been well studied in the context of machine translation. This paper adapts the bilingual word alignment algorithm to monolingual scenario to extract collocations from monolingual corpus. The monolingual corpus is first replicated to generate a parallel corpus, where each sentence pair consists of two identical sentences in the same language. Then the mon...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Comput. Proc. Oriental Lang.

دوره 21  شماره 

صفحات  -

تاریخ انتشار 2008